Skip to content

ARROW-5955: [Plasma] Support setting memory quotas per plasma client for better isolation#4885

Closed
ericl wants to merge 1 commit intoapache:masterfrom
ericl:plasma-lru
Closed

ARROW-5955: [Plasma] Support setting memory quotas per plasma client for better isolation#4885
ericl wants to merge 1 commit intoapache:masterfrom
ericl:plasma-lru

Conversation

@ericl
Copy link
Copy Markdown
Contributor

@ericl ericl commented Jul 16, 2019

Currently, plasma evicts objects according a global LRU queue. In Ray, this often causes memory-intensive workloads to fail unpredictably, since a client that creates objects at a high rate can evict objects created by clients at lower rates. This is despite the fact that the true working set of both clients may be quite small.

This PR adds

  • support for configuring per-client LRU eviction queues at runtime
  • improved debugging of memory usage of clients

Details on the implementation:

QuotaAwarePolicy extends the basic eviction policy to implement per-client memory quotas. This effectively gives each client its own LRU queue, which caps its memory usage and protects this memory from being evicted by other clients.

The quotas are enforced when objects are first created, by evicting the necessary number of objects from the client's own LRU queue to cap its memory usage. Once that is done, allocation is handled by the normal eviction policy. This may result in the eviction of objects from the global LRU queue, if not enough memory can be allocated even after the evictions from the client's own LRU queue.

Some special cases:

  • When a pinned object is "evicted" from a per-client queue, it is instead transferred into the global LRU queue.
  • When a client disconnects, its LRU queue is merged into the head of the global LRU queue.

cc @pcmoritz

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice!

Copy link
Copy Markdown
Contributor

@pcmoritz pcmoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Some small comments above. Also I'm correct to assume that if no quotas are set, the behavior is unchanged from before, right?

@ericl
Copy link
Copy Markdown
Contributor Author

ericl commented Jul 16, 2019

That's right, if no clients set a quota, the behaviour should be identical from before.

@codecov-io
Copy link
Copy Markdown

codecov-io commented Jul 18, 2019

Codecov Report

Merging #4885 into master will increase coverage by 1.63%.
The diff coverage is 95.03%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4885      +/-   ##
==========================================
+ Coverage   87.45%   89.09%   +1.63%     
==========================================
  Files         994      719     -275     
  Lines      140389   100613   -39776     
  Branches     1418        0    -1418     
==========================================
- Hits       122778    89641   -33137     
+ Misses      17249    10972    -6277     
+ Partials      362        0     -362
Impacted Files Coverage Δ
cpp/src/plasma/store.h 100% <ø> (ø) ⬆️
cpp/src/plasma/client.h 100% <ø> (ø) ⬆️
cpp/src/plasma/protocol.h 100% <ø> (ø) ⬆️
python/pyarrow/_plasma.pyx 89.28% <0%> (-2.82%) ⬇️
cpp/src/plasma/store.cc 91.33% <100%> (+0.19%) ⬆️
cpp/src/plasma/protocol.cc 94.29% <100%> (+0.52%) ⬆️
cpp/src/plasma/quota_aware_policy.cc 100% <100%> (ø)
cpp/src/plasma/quota_aware_policy.h 100% <100%> (ø)
cpp/src/plasma/eviction_policy.h 100% <100%> (ø) ⬆️
cpp/src/plasma/plasma.h 100% <100%> (ø) ⬆️
... and 286 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2746a23...f3638b8. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants